LIQUID: A Framework for List Question Answering Dataset Generation

نویسندگان

چکیده

Question answering (QA) models often rely on large-scale training datasets, which necessitates the development of a data generation framework to reduce cost manual annotations. Although several recent studies have aimed generate synthetic questions with single-span answers, no study has been conducted creation list multiple, non-contiguous spans as answers. To address this gap, we propose LIQUID, an automated for generating QA datasets from unlabeled corpora. We first convert passage Wikipedia or PubMed into summary and extract named entities summarized text candidate This allows us select answers that are semantically correlated in context is, therefore, suitable constructing questions. then create using off-the-shelf question generator extracted original passage. Finally, iterative filtering answer expansion performed ensure accuracy completeness Using our data, significantly improve performance previous best by exact-match F1 scores 5.0 MultiSpanQA, 1.9 Quoref, 2.8 averaged across three BioASQ benchmarks.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Question Generation for Question Answering

This paper presents how to generate questions from given passages using neural networks, where large scale QA pairs are automatically crawled and processed from Community-QA website, and used as training data. The contribution of the paper is 2-fold: First, two types of question generation approaches are proposed, one is a retrieval-based method using convolution neural network (CNN), the other...

متن کامل

Web-based List Question Answering

While research on question answering has become popular in recent years, the problem of efficiently locating a complete set of distinct answers to list questions in huge corpora or the Web is still far from being solved. This paper exploits the wealth of freely available text and link structures on the Web to seek complete answers to list questions. We introduce our system, FADA, which relies o...

متن کامل

Automatic Set Expansion for List Question Answering

This paper explores the use of set expansion (SE) to improve question answering (QA) when the expected answer is a list of entities belonging to a certain class. Given a small set of seeds, SE algorithms mine textual resources to produce an extended list including additional members of the class represented by the seeds. We explore the hypothesis that a noise-resistant SE algorithm can be used ...

متن کامل

Question Paraphrase Generation for Question Answering System

The queries to a practical Question Answering (QA) system range from keywords, phrases, badly written questions, and occasionally grammatically perfect questions. Among different kinds of question analysis approaches, the pattern matching works well in analyzing such queries. It is costly to build this pattern matching module because tremendous manual labor is needed to expand its coverage to s...

متن کامل

QuestionCube: a Framework for Question Answering

QuestionCube is a framework for Question Answering (QA) that combines several techniques to retrieve passages containing the exact answers for natural language questions. It exploits: (a) Natural Language Processing algorithms for question and candidate answers analysis both in English and Italian; (b) Information Retrieval probabilistic models for candidate answers retrieval and (c) Machine Le...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i11.26529